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Factors like network delay, latency and bandwidth significantly affect the 
quality of communication using Voice over Internet Protocol. The use of 
jitter buffer at the receiving end compensates the effect of varying network 
delay up to some extent. But the extra buffer delay given for each packet 
plays a major role in playing late packets and thereby improving voice 
quality. As the buffer delay increases packet loss rate decreases, which in 
general is a very good sign. However, an increase of buffer delay beyond a 
certain limit affects the interactive quality of voice communication. In this 
paper, we propose a statistical framework for adaptive playout scheduling of 
voice packets based on network statistics, packet loss rate and availability of 
packets in the buffer. Experimental results show that the proposed model 


Playout scheduling allocates optimal buffer delay with the lowest packet loss rate when 
PLC compared with other algorithms. 
VoIP 
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1. INTRODUCTION 

Voice over Internet Protocol (VoIP) is a technology used for the voice transmission over internet 
protocol. The wide-spread usage of internet and cheap communication cost made VoIP a very popular 
technology. In VoIP, voice conversations are digitized and then packetized for transmission. These packets 
are transmitted every 20 to 40 ms [1] and expect that these packets reach the receiver side in the same 
interval. If the packets are arrived at the receiver side without any delay variations, they can be played 
directly. But due to network impairments like delay and latency, the packets are lost or may not arrive in the 
same order in which the sender sent. This causes the end users to experience a broken, buzzing and delayed 
speech. Extensive research is going on this area to increase the quality of communication [2-4]. Many 
researchers proposed receiver side Packet Loss Concealment (PLC) methods [5]. 

The difference in packets inter-arrival time is known as jitter and to mitigate the effect of jitter, a 
jitter buffer is used at the receiver side. This jitter buffer holds received packets till its estimated playout time, 
to accommodate late packets. Thus the playout time of received packets are delayed to reduce packet loss 
rate. Increasing buffer delay extends playout time of the packets received and hence communication 
interactivity is reduced. At the same time, this reduces late packet loss rate. But decrease in buffer delay 
increases late packet loss rate. Many playout scheduling algorithms have been proposed to solve this trade- 
off. Playout scheduling schemes are classified as static (fixed) and adaptive. Static playout scheduling is the 
simplest method, where the playout time of all the packets in a session is fixed irrespective of the varying 
network delay [6]. Since the network conditions are volatile, this method is not very effective in reducing 
packet loss rate. Keeping a fixed playout scheduling time for all the packets in face of varying network 
conditions is not a good option. 
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In adaptive playout scheduling algorithms, the playout time of the packets is adaptively adjusted 
according to the network delay. i.e., Packets of the same talk spurt experience different playout time and the 
sequential playout of packets is implemented by PLC mechanisms like extending or compressing the silence 
periods. PLC approaches like signal reconstruction and timescale modification methods are also widely used 
for the sequential playout of packets at receiver side. The efficiency of adaptive playout scheduling 
mechanisms depends on the accurate estimation of the buffer delay needed by the arrived voice packets so 
that late packets can also be played. In this paper, we propose a framework for adaptive playout scheduling of 
voice packets by estimating buffer delay of the packets based on the network statistics, packet loss rate and 
availability of packets in buffer. 

Several methods have been developed for estimating the adaptive playout time of packets. Ramjee 
et al. [7] proposed four algorithms for adaptive playout delay estimation of voice packets. In this method, the 
playout time of first packet of the talkspurt is estimated using an autoregressive estimate and an offset is 
added to calculate the playout time of successive packets in the same talkspurt. But the delay adaptation is 
applied only to the first packet of the talk spurt. Also the buffer size is exponentially increased to store late 
packets. Pinto et al. [8] proposed an adaptive gapbased algorithm that can be tuned for both end-to-end delay 
and packet loss to satisfy a user-desired tolerance. 

The method proposed by Liu et al. [9] adaptively adjusts jitter buffer according to an indication of a 
sequence number of the delaying packet. Approaches based on order statistics based estimation [10], network 
variations [11-12] and quality-driven playout buffer optimizations [13] are also proposed. 
B. H. Kim et al. [12] used a static buffer size of 200ms to hold packets. Event counting algorithm proposed 
in [14] relies on the measure of relative times in handling the object-related events. Normal approximation is 
used to identify the delay of network in [15]. Modified enhanced normalized least mean squares algorithm 
(ENLMS) is also used to identify the spike state of the network [16]. 

Different methods have been proposed by researchers to solve the issues with adaptive playout 
scheduling in VoIP communication. S.B.Moon et al., [17] computed upper and lower bounds on the 
minimum average playout delay for a given packet loss by mainitaining delay percentile information. M. K. 
Ranganathan and L. Kilmartin S [18] proposed a novel fuzzy trend analyzer system to estimate intra-talk 
spurt playout delay adaptation. Sreenan and Cormac [19] suggested histogram based approaches for 
synchronizing networked multimedia streams. 


2. PROPOSED METHOD 
In our proposed method, we perform the following operations to implement adaptive playout 
scheduling. 
a. Computation of network statistics 
b. Estimation of playout time based on optimal buffer delay 
In the following section, a brief explanation of adaptive playout scheduling is given. 


2.1. Adaptive Playout Scheduling 

In adaptive playout scheduling algorithms, the playout time of the packets are adaptively adjusted 
during silence periods and within talkspurts. This is mainly achieved by scaling of voice packets within 
talkspurts and keeping the synchronous playout of packets. Each packet in the talkspurt experiences different 
playout time. Buffer delay estimation is a major component in calculating the playout time of packets. The 
tradeoff between buffer delay and late packet loss rate can be solved using an efficient buffer delay 
estimation method. We assume that both sender and receiver maintain clock synchronization. As 
packetization delay and codec delay are codec dependent, they both are not taken into account while 
computing the playout time of the packets. Playout time of packet i, tp; is the time packet is played at the 
receiver and network delay, dn; is the time gap between sending and receiving time of a packet. 


tp; = tri + db; (1) 
dn; = tri oa ts; (2) 


where, tr;, db; and ts; are receiving time, buffer delay and sending time respectively. Buffering time is 
estimated for each packet in a talk spurt to calculate the playout time. When the buffering time increases, 
playout time of the packet also increases. Even though the packet loss rate decreases due to the increase in 
buffer delay, it reduces the voice quality and listeners experience a broken, buzzing and delayed speech. 
Total delay experienced by a packet, dt; , is the difference between the time it is sent from sender and the 
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time it is played on the receiver side. It is the sum of network delay and buffering delay and is given in 
Equation (3). 


dt; = dn; + db; (3) 
The jitter of i" packet, j;, for a talk spurt is calculated as, 
i= dn; — dni- = (tr; — ts, )— (tri-1 — tsi-1) (4) 


2.2. Computation of Network Statistics and Optimal Buffer Delay 

The main idea behind our new approach is to compute the playout time of a packet using buffering 
delay based on the network statistics of already arrived packets. We use a vector to store the details of 
already arrived packets. In order to track the latest network conditions, old packet information are removed 
periodically from vector and updated with latest packet information. It follows a window based approach so 
that the estimations always adapt to the varying network conditions. In addition to the past network statistics, 
packets waiting in buffer and late packet loss rate are also considered as factors for buffer delay computation. 

Normal and spike modes of network states for each packet are identified based on the network 
delay, dn;, of the packets and the threshold value, thspike [12]. If a packet is received with a delay greater 
than thspizxe, Current mode is identified as spike. Otherwise, the mode is taken as a normal. The network 
delay is estimated using equation (5): 


dni =X, * dni- + (1—«,) * dn; (5) 


where &, is a weighing factor which depends on delay statistics [7]. Using the above estimated network 
delay, playout time of packet i is computed as given in equation (6). 


tpi = ts; + dn; + db; (6) 


The accurate estimation of buffer delay for playout time calculation helps in allocating late packets in the 
buffer for playing, and thereby reducing late packet loss rate. For this, we have made the value of db; 
proportional to the fluctuating network delay, late loss rate and availability of packets in buffer. 

Buffer delay is estimated using equation (7). 


db; =% * Ji * Liate (7) 


w is the delay factor which depends on availability of preceding and succeeding packets. The network jitter J 
of i th packet is estimated as, 


ji = m + 9, *vari (8) 


where @ is the weighting factor of the inter-arrival jitter variance, m and var are the average and variance of 
inter-arrival jitter, respectively. For every packet that arrive at the receiver, the algorithm checks the current 
and previous mode, and the value of m and varis calculated differently in each mode using separate 
equations as given in [12]. This way, quickly varying network conditions can be adapted more accurately for 
the estimation of playout time. The value of ọ is proportional to network characteristics and is calculated as: 


Q; = Mj-17 Ci-1 (9) 


varj_-1 


where o is the standard deviation. The value of m, var and o are calculated for all the packets stored in 
window in order to predict the current network state. The size of window is specified in section 3. In addition 
to the packet loss rate, our method also considers order of already received packets while predicting buffer 
delay of i™ packet. The calculated jitter can be negative if the packets are received out of order. We use a 
delay factor w to adjust buffer delay of expected packet and its value varies based on the above specified 
scenarios. The value of w is low, lower, high or higher according to the availability of preceding and 
succeeding packets. For example, if we check the availability of only succeeding and preceding packets 
received, we assign a value to w based on the following four different cases. 

Both p;_, and pi; are received, 0 < lower < 1 
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P;-1 is received and p;,, not received, 0 < low <1 
Pi-1 not received, but p;.1 is received, 1 < high < 2 
pi-1 and, but p;,, not received, 1 < higher < 2 
This way the value assigned to w is considered as one parameter for controlling the buffer delay of packets. 
For example, if both the succeeding and preceding packets are already arrived, then a small value is assigned 
to w. The proposed algorithm for estimating playout time of packets is given as: 
Algorithm: Estimation of Playout time 

1. Receive packets 

2. for (each packet in the talkspurt) 

3 Compute network delay 

4. Estimate the jitter and delay factor 

5 Estimate buffer delay 

6. Calculate playout time 

7. endfor 
Packets are discarded based on playout time of late packets and is given as: 
store p, in buffer If (try < tp, ) 
discard px, Otherwise 


3. RESULTS AND ANALYSIS 

We have evaluated our proposed algorithm for playout time estimation using an experimental setup 
[20] as shown in Figure 1. The four modules used for this experimental setup are: Session Initiation Protocol 
(SIP) application, network traffic emulator, VoIP server and a packet analyzer. We have tested VoIP 
communication between two endpoints, endpoint1 and endpoint2. A VoIP server installed on a computer is 
used to test the call flows between two end-points. One endpoint is a mobile device which is registered with 
the server using a VoIP application. PJSIP application running on a computer is used as another endpoint. 
Program for Network statistics based playout time estimation is written in C language. Network emulator is 
used to set different network parameters like jitter, delay, packets reordering and packets loss in RTP packets 
flow. The two endpoints act as SIP clients and call flows between these two SIP clients were analyzed using 


packet analyzer. 
emai = End-point2 


ws 
VolP Appl n 
Network Analyzer End-point1 


Figure 1. Experimental setup 


The Real Time Protocol (RTP) packets sent from endpoint1 is redirected through VoIP server (SIP 
proxy server) and then sent to endpoint2. Recorded speech sample database, provided by the International 
Telecommunication Union [21] are taken for our experimental study. The performance of our method is 
evaluated using the metric, average buffering delay ( db gyg) and late packet loss rate ( L zate). 


1 
db avg © 5 i=1 db; (17) 


where P is the set of packets played. 


R-P 
Liste = — (18) 


where R the set of packets received. 
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Also to evaluate communication quality, objective voice quality test is carried out using 
E-model [22]. The quality of transmission rating in E-Model varies between 0 and 100 and is specified by 
Transmission Rating factor (R factor). 0 represents bad quality and 100 represents extremely good quality of 
communication. Then, R factor is mapped to MOS (Mean Opinion Score) scores for finding conversational 
quality. 

We have used network emulator to create traces with varying jitter. One way-delay of the 
communication is measured for each packet having 160 bytes of payload size and the G.711 coded voice 
packets are generated at the sender in 30 ms time interval. Performance of our proposed method is compared 
with two different adaptive playout scheduling schemes. Method 1 uses an autoregressive estimate to predict 
the network delay and jitter and is given as algorithm 4 in [7], method2 uses self adaptive jitter buffer 
adjustment mechanism discussed in [9] and method 3 is based on linear prediction with a static buffer 
size [12]. We denote auto_reg, self_adapt, linear_pred and PM for method1, method2, method3 and proposed 
method respectively. 

Playout delay estimation algorithm is executed for every packet in the talkspurt. Since the 
packetization interval of audio packets is taken as 30ms [1], the algorithm should be efficient to execute 
maximum 34 times per second. Also the calculation complexity increases as window size increases while 
calculating network statistics. At the same time, as the window size decreases, the adaptation is more 
responsive to the volatile network behavior [7] [14]. So it is important to select the window size which adapts 
and estimates accurate network characteristics. After experimenting with different window sizes, we kept a 
window of size 100 packets which resulted in optimal performance. For delay factor calculation, we have 
checked availability of three succeeding and two preceding packets. Six speech samples were collected for 
analysis, and Table 1 depicts network trace statistics along with the experimental results. Among the six 
traces collected, trace1 has small jitters; traces 2, 3 and 4 are having medium jitters, while trace 4 and 6 is 
having large jitters. Standard Deviation (STD) of network delay of the voice data varies for traces. The 
maximum jitter experienced in tracel, trace2, trace3 trace4, trace5 and trace6 are 51, 125, 160, 180, 240 and 
325 respectively. For the small jitter case (trace1), average buffer delay is nearer to the ideal case (30ms) and 
as the jitter increases, buffer delay also increases. Performances of six algorithms for six different traces are 
depicted in Figure 2. For the trace with larger jitter value, 325ms, proposed method gives a high MOS score 
value compared with other three algorithms. 


Table 1. Experimental Results 


STD of network Maximum Methods 5 MOS 

Trace delay (ms) Jitter (ms) used db avg L iate (%) 
auto_reg 42.41 5.46 3.41 
i E 51 self_adapt 38.53 2.67 3.51 
linear_pred 38.73 2.13 3.78 
PM 35.15 1.32 4.01 
auto_reg 55.13 723 2.91 
5 53.96 125 self_adapt 49.38 4.25 3.43 
linear_pred 48.21 3.97 3.52 
PM 44.37 2.53 3.72 
auto_reg 56.28 8.62 2.99 
A Ai 160 self_adapt 48.46 4.67 3.76 
linear_pred 49.84 5.14 3.47 
PM 44.72 3.63 3.61 
auto_reg 56.48 9.01 2.99 
P 95.67 180 self_adapt 50.76 8.58 3.76 
linear_pred 50.24 8.21 3.47 
PM 44.97 7.03 3.61 
auto_reg 61.24 9.27 2.42 

lf adapt 

5 27.13 240 sè _adap 55.72 8.24 2.63 
linear_pred 54.57 10.67 2.75 
PM 51.36 7.72 3.02 
auto_reg 79.63 12.82 2.17 
‘ 29.81 325 self_adapt 72.52 10.15 2.64 
linear_pred 72.41 11.21 2.62 
PM 64.56 8.33 2.95 
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The performance of four adaptive playout scheduling algorithms for all the traces is depicted in 
Figure 2. In all the cases proposed method results in lowest late loss rate and buffering delay. As the jitter 
increases, average buffering delay is increased to reduce late packet loss rate. 
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Figure 2. Performance of four adaptive playout scheduling algorithms for six different traces 


The comparison of late packet loss and average buffering delay estimation of algorithms for traces is 


depicted in Figure 3(a) and Figure 3(b) respectively. In all the cases proposed method results in lowest late 
loss, buffer loss and buffering delay. 
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Figure 3. (a) Late packet loss rate (b) Average buffering delay of six traces 


Figure 4 shows MOS score of all the traces with small, medium and large jitter cases. MOS score 
obtained from the experiments indicates that for all the traces speech quality is significantly improved by 
proposed algorithm. Experimental results show that proposed method estimates playout time of packets to 
match the jitter variations while keeping buffering delay minimum for maintaining interactive voice quality. 
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Figure 4. MOS score of different algorithms 


4. CONCLUSION 

Widespread use of internet replaces traditional telecommunication system by the VoIP services. 
Rendering high quality service to the users is a crucial challenge faced. Network delay is one major 
impairment factor which affects the quality of communication. Network delay causes the late arrival or loss 
of packets and there by exacerbating the voice quality. In this paper, we propose a model based on network 
Statistics, packet loss rate and packet availability in the buffer to estimate playout time of each packet. We 
have compared our algorithm with existing three adaptive playout scheduling algorithms and the results 
indicate that our method solves the trade-off between buffer delay and packet loss in a better way than the 
existing adaptive playout scheduling methods with minimum packet loss rate and buffering delay. Also the 
communication quality is evaluated using an objective voice quality test, E-model and our algorithm achieves 
a higher MOS score, for all the traces, than the other three algorithms. 
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